Protein Family Classi cation using Sparse Markov Transducers

نویسندگان

  • Eleazar Eskin
  • William Noble Grundy
  • Yoram Singer
چکیده

In this paper we present a method for classifying proteins into families using sparse Markov transducers (SMTs). Sparse Markov transducers, similar to probabilistic suÆx trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suÆx trees by allowing for wild-cards in the conditioning sequences. Because substitutions of amino acids are common in protein families, incorporating wildcards into the model signi cantly improves classi cation performance. We present two models for building protein family classi ers using SMTs. We also present eÆcient data structures to improve the memory usage of the models. We evaluate SMTs by building protein family classi ers using the Pfam database and compare our results to previously published results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Family Classification Using Sparse Markov Transducers

We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by estimating probability distributions over subsequences of amino acids from the protein. Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditio...

متن کامل

Image Classification Based on a Multiresolution Two Dimensional Hidden Markov Model

This paper presents an image classi cation algorithm using a multiresolution two dimensional hidden Markov model (HMM). The multiresolution two dimensional hidden Markov model is an extension from the two dimensional hidden Markov model for image classi cation. A classi er estimates model parameters using the EM algorithm. Classi cation is then performed according to the maximum a posteriori pr...

متن کامل

PIRSF: family classi®cation system at the Protein Information Resource

The Protein Information Resource (PIR) is an integrated public resource of protein informatics. To facilitate the sensible propagation and standardization of protein annotation and the systematic detection of annotation errors, PIR has extended its superfamily concept and developed the SuperFamily (PIRSF) classi®cation system. Based on the evolutionary relationships of whole proteins, this clas...

متن کامل

Hidden Markov Models for Silhouette Classi cation

In this paper, a new technique for object classi cation from silhouettes is presented. Hidden Markov Models are used as a classi cation mechanism. Through a set of experiments, we show the validity of our approach and show its invariance under severe rotation conditions. Also, a comparison with other techniques that use Hidden Markov Models for object classi cation from silhouettes is presented.

متن کامل

Joint Video Scene Segmentation and Classification based on Hidden Markov Model

Video classi cation and segmentation are fundamental steps for e cient accessing, retrieving and browsing large amount of video data. We have developed a scene classi cation scheme using a Hidden MarkovModel (HMM)based classi er. By utilizing the temporal behaviors of di erent scene classes, HMM classi er can e ectively classify video segments into one of the prede ned scene classes. In this pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000